Search CORE

8 research outputs found

Instruction replication for clustered microarchitectures

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
David Kaeli
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate values between clusters can result in a significant performance loss. Inter-cluster communications can be reduced by selectively replicating an appropriate set of instructions. However, instruction replication must be done carefully since it may also degrade performance due to the increased contention it can place on processor resources. The proposed scheme is built on top of a previously proposed state-of-the-art modulo scheduling algorithm that effectively reduces communications. Results show that the number of communications can decrease using replication, which results in significant speed-ups. IPC is increased by 25% on average for a 4-cluster microarchitecture and by as mush as 70% for selected programs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Exploiting pseudo-schedules to guide data dependence graph partitioning

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
David Kaeli
González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning algorithms that are guided by a pseudo-scheduler. This pseudo-scheduler is a simplified version of the full instruction scheduler and estimates key constraints that would be encountered in the final schedule. The final scheduling process is bi-directional and includes on-the-fly spill code generation. The proposed scheme is evaluated against previous scheduling approaches using the SPECfp95 benchmark suite. Our modeling results show that better schedules are obtained for most programs across a range of different architectures. For a 4-cluster VLIW architecture with 32 registers and a 2-cycle inter-cluster communication delay we obtain an average speedup of 38.5%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
González Colás Antonio María
Kaeli D
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into account all of the constraints that influence the final schedule. To further reduce the number of intercluster communications, heuristics for instruction replication are included. The proposed scheme is evaluated using the SPECfp95 programs. The described scheme outperforms a state-of-the-art scheduler for all programs and different cluster configurations. For some configurations, the speedup obtained when using this new scheme is greater than 40 percent, and for selected programs, performance can be more than doubled.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Heterogeneous clustered VLIW microarchitectures

Author: Aleta Ortega Alexandre
Codina Josep Maria
González Colás Antonio María
Kaeli David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Increasing performance, while at the same time reducing power consumption, is a major design tradeoff in current microprocessors. In this paper, we investigate the potential of using a heterogeneous clustered VLIW microarchitecture. In the proposed microarchitecture, each cluster, the interconnection network and the supporting memory hierarchy can run at different frequencies and voltages. Some of the clusters can then be configured to be performance-oriented and run at high frequency, while the other clusters can be configured to be low-power-oriented and run at lower frequencies, thus reducing overall consumption. For this heterogeneous design to be effective, we need to select the most suitable frequencies and voltages for each component. We propose a scheme to choose these parameters based on a model that estimates the energy consumption and the execution time of floating-point codes at compile time. Finally, we present a modulo scheduling technique based on graph partitioning that exploits the opportunities presented on heterogeneous clustered microarchitectures. Results show that the Energy-Delay product (ED2) can be significantly reduced by 15% on average for a microarchitecture with 4-clusters and by as much as 35% for selected programsPeer Reviewe

Crossref

UPCommons. Portal del coneixement obert de la UPC

Instruction replication for clustered microarchitectures

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
David Kaeli
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

RECERCAT

Graph-partitioning based instruction scheduling for clustered processors

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

This paper presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a preliminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill code generation. The graph partitioning scheme is shown to be very effective due to its global view of the whole code while the partition is generated. Results show a significant speedup when compared with previously proposed techniques. For some processor configuration the average speedup for the SPECfp95 is 23% with respect to the published scheme with the best performance. Besides, the proposed scheme is much faster (between 2-7 times, depending on the configuration).Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

Graph-partitioning based instruction scheduling for clustered processors

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

RECERCAT

AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
González Colás Antonio María
Kaeli D
Sánchez Navarro F. Jesús
Publication venue
Publication date
Field of study

RECERCAT